16 research outputs found
Detecting the Sensing Area of A Laparoscopic Probe in Minimally Invasive Cancer Surgery
In surgical oncology, it is challenging for surgeons to identify lymph nodes
and completely resect cancer even with pre-operative imaging systems like PET
and CT, because of the lack of reliable intraoperative visualization tools.
Endoscopic radio-guided cancer detection and resection has recently been
evaluated whereby a novel tethered laparoscopic gamma detector is used to
localize a preoperatively injected radiotracer. This can both enhance the
endoscopic imaging and complement preoperative nuclear imaging data. However,
gamma activity visualization is challenging to present to the operator because
the probe is non-imaging and it does not visibly indicate the activity
origination on the tissue surface. Initial failed attempts used segmentation or
geometric methods, but led to the discovery that it could be resolved by
leveraging high-dimensional image features and probe position information. To
demonstrate the effectiveness of this solution, we designed and implemented a
simple regression network that successfully addressed the problem. To further
validate the proposed solution, we acquired and publicly released two datasets
captured using a custom-designed, portable stereo laparoscope system. Through
intensive experimentation, we demonstrated that our method can successfully and
effectively detect the sensing area, establishing a new performance benchmark.
Code and data are available at
https://github.com/br0202/Sensing_area_detection.gitComment: Accepted by MICCAI 202
Language-driven Scene Synthesis using Multi-conditional Diffusion Model
Scene synthesis is a challenging problem with several industrial
applications. Recently, substantial efforts have been directed to synthesize
the scene using human motions, room layouts, or spatial graphs as the input.
However, few studies have addressed this problem from multiple modalities,
especially combining text prompts. In this paper, we propose a language-driven
scene synthesis task, which is a new task that integrates text prompts, human
motion, and existing objects for scene synthesis. Unlike other single-condition
synthesis tasks, our problem involves multiple conditions and requires a
strategy for processing and encoding them into a unified space. To address the
challenge, we present a multi-conditional diffusion model, which differs from
the implicit unification approach of other diffusion literature by explicitly
predicting the guiding points for the original data distribution. We
demonstrate that our approach is theoretically supportive. The intensive
experiment results illustrate that our method outperforms state-of-the-art
benchmarks and enables natural scene editing applications. The source code and
dataset can be accessed at https://lang-scene-synth.github.io/.Comment: Accepted to NeurIPS 202
Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation
Affordance detection presents intricate challenges and has a wide range of
robotic applications. Previous works have faced limitations such as the
complexities of 3D object shapes, the wide range of potential affordances on
real-world objects, and the lack of open-vocabulary support for affordance
understanding. In this paper, we introduce a new open-vocabulary affordance
detection method in 3D point clouds, leveraging knowledge distillation and
text-point correlation. Our approach employs pre-trained 3D models through
knowledge distillation to enhance feature extraction and semantic understanding
in 3D point clouds. We further introduce a new text-point correlation method to
learn the semantic links between point cloud features and open-vocabulary
labels. The intensive experiments show that our approach outperforms previous
works and adapts to new affordance labels and unseen objects. Notably, our
method achieves the improvement of 7.96% mIOU score compared to the baselines.
Furthermore, it offers real-time inference which is well-suitable for robotic
manipulation applications.Comment: 8 page
Detecting the Sensing Area of a Laparoscopic Probe in Minimally Invasive Cancer Surgery
In surgical oncology, it is challenging for surgeons to identify lymph nodes and completely resect cancer even with pre-operative imaging systems like PET and CT, because of the lack of reliable intraoperative visualization tools. Endoscopic radio-guided cancer detection and resection has recently been evaluated whereby a novel tethered laparoscopic gamma detector is used to localize a preoperatively injected radiotracer. This can both enhance the endoscopic imaging and complement preoperative nuclear imaging data. However, gamma activity visualization is challenging to present to the operator because the probe is non-imaging and it does not visibly indicate the activity origination on the tissue surface. Initial failed attempts used segmentation or geometric methods, but led to the discovery that it could be resolved by leveraging high-dimensional image features and probe position information. To demonstrate the effectiveness of this solution, we designed and implemented a simple regression network that successfully addressed the problem. To further validate the proposed solution, we acquired and publicly released two datasets captured using a custom-designed, portable stereo laparoscope system. Through intensive experimentation, we demonstrated that our method can successfully and effectively detect the sensing area, establishing a new performance benchmark. Code and data are available at https://github.com/br0202/Sensing_area_detection.git
Language-Conditioned Affordance-Pose Detection in 3D Point Clouds
Affordance detection and pose estimation are of great importance in many
robotic applications. Their combination helps the robot gain an enhanced
manipulation capability, in which the generated pose can facilitate the
corresponding affordance task. Previous methods for affodance-pose joint
learning are limited to a predefined set of affordances, thus limiting the
adaptability of robots in real-world environments. In this paper, we propose a
new method for language-conditioned affordance-pose joint learning in 3D point
clouds. Given a 3D point cloud object, our method detects the affordance region
and generates appropriate 6-DoF poses for any unconstrained affordance label.
Our method consists of an open-vocabulary affordance detection branch and a
language-guided diffusion model that generates 6-DoF poses based on the
affordance text. We also introduce a new high-quality dataset for the task of
language-driven affordance-pose joint learning. Intensive experimental results
demonstrate that our proposed method works effectively on a wide range of
open-vocabulary affordances and outperforms other baselines by a large margin.
In addition, we illustrate the usefulness of our method in real-world robotic
applications. Our code and dataset are publicly available at
https://3DAPNet.github.ioComment: Project page: https://3DAPNet.github.i
CathSim: An Open-source Simulator for Autonomous Cannulation
Autonomous robots in endovascular operations have the potential to navigate
circulatory systems safely and reliably while decreasing the susceptibility to
human errors. However, there are numerous challenges involved with the process
of training such robots such as long training duration due to sample
inefficiency of machine learning algorithms and safety issues arising from the
interaction between the catheter and the endovascular phantom. Physics
simulators have been used in the context of endovascular procedures, but they
are typically employed for staff training and generally do not conform to the
autonomous cannulation goal. Furthermore, most current simulators are
closed-source which hinders the collaborative development of safe and reliable
autonomous systems. In this work, we introduce CathSim, an open-source
simulation environment that accelerates the development of machine learning
algorithms for autonomous endovascular navigation. We first simulate the
high-fidelity catheter and aorta with the state-of-the-art endovascular robot.
We then provide the capability of real-time force sensing between the catheter
and the aorta in the simulation environment. We validate our simulator by
conducting two different catheterisation tasks within two primary arteries
using two popular reinforcement learning algorithms, Proximal Policy
Optimization (PPO) and Soft Actor-Critic (SAC). The experimental results show
that using our open-source simulator, we can successfully train the
reinforcement learning agents to perform different autonomous cannulation
tasks
Simultaneous Depth Estimation and Surgical Tool Segmentation in Laparoscopic Images
Surgical instrument segmentation and depth estimation are crucial steps to improve autonomy in robotic surgery. Most recent works treat these problems separately, making the deployment challenging. In this paper, we propose a unified framework for depth estimation and surgical tool segmentation in laparoscopic images. The network has an encoder-decoder architecture and comprises two branches for simultaneously performing depth estimation and segmentation. To train the network end to end, we propose a new multi-task loss function that effectively learns to estimate depth in an unsupervised manner, while requiring only semi-ground truth for surgical tool segmentation. We conducted extensive experiments on different datasets to validate these findings. The results showed that the end-to-end network successfully improved the state-of-the-art for both tasks while reducing the complexity during their deployment
A Novel Medical Image Watermarking in Three-dimensional Fourier Compressed Domain
Digital watermarking is a research hotspot in the field of image security, which is protected digital image copyright. In order to ensure medical image information security, a novel medical image digital watermarking algorithm in three-dimensional Fourier compressed domain is proposed. The novel medical image digital watermarking algorithm takes advantage of three-dimensional Fourier compressed domain characteristics, Legendre chaotic neural network encryption features and robust characteristics of differences hashing, which is a robust zero-watermarking algorithm. On one hand, the original watermarking image is encrypted in order to enhance security. It makes use of Legendre chaotic neural network implementation. On the other hand, the construction of zero-watermarking adopts differences hashing in three-dimensional Fourier compressed domain. The novel watermarking algorithm does not need to select a region of interest, can solve the problem of medical image content affected. The specific implementation of the algorithm and the experimental results are given in the paper. The simulation results testify that the novel algorithm possesses a desirable robustness to common attack and geometric attack